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Abstract: The western flower thrips, Frankliniella occidentalis (Pergande), is an invasive 
species and the most economically important pest within the insect order Thysanoptera. 
For a better understanding of the genetic makeup and migration patterns of F. occidentalis 
throughout the world, we characterized 18 novel polymorphic EST-derived microsatellites. 
The mutational mechanism of these EST-SSRs was also investigated to facilitate the 
selection of appropriate combinations of markers for population genetic studies. Genetic 
diversity of these novel markers was assessed in 96 individuals from three populations in 
China (Harbin, Dali, and Guiyang). The results showed that all these 18 loci were highly 
polymorphic; the number of alleles ranged from 2 to 15, with an average of 5.50 alleles per 
locus. The observed (H 0 ) and expected (H E ) heterozygosities ranged from 0.072 to 0.707 
and 0.089 to 0.851, respectively. Furthermore, only two locus/population combinations 
(WFT144 in Dali and WFT50 in Guiyang) significantly deviated from 
Hardy- Weinberg equilibrium (HWE). Pairwise F ST analysis showed a low but significant 
differentiation (0.026 < Fst < 0.032) among all three pairwise population comparisons. 
Sequence analysis of alleles per locus revealed a complex mutational pattern of these 
EST-SSRs. Thus, these EST-SSRs are useful markers but greater attention should be paid 
to the mutational characteristics of these microsatellites when they are used in population 
genetic studies. 
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1. Introduction 

The western flower thrips, Frankliniella occidentalis (Pergande), is the most economically important 
pest within the insect order Thysanoptera, which includes more than 5500 described species [1]. 
F. occidentalis causes enormous damage by directly feeding on greenhouse vegetable and ornamental 
crops and by transmitting plant-pathogenic tospoviruses [2]. F. occidentalis is endemic to North 
America in an area west of the Rocky Mountains from Mexico to Alaska [3]. Since the late 1970s, 
F. occidentalis has rapidly invaded most countries throughout the world where it not only causes 
severe economic losses but also threatens endemic invertebrates and associated ecosystems [4]. In 
order to control F. occidentalis, it is first necessary to know its genetic diversity, population structure 
and invasion history. Genetic tools, such as microsatellites markers, can reveal the origin of newly 
established populations, their genetic makeup and their routes of migration [5,6]. 

Microsatellites, or simple sequence repeats (SSRs), consist of tandemly repeated motifs that are 1-6 bp 
in length, and they are widely distributed throughout the eukaryotic genomes [7]. Conventionally, two 
models of mutations have been considered for microsatellites, the stepwise mutational model (SMM) 
and the infinite allele model (IAM). The SMM states that all mutational events involve a change in a 
single repeat only. The IAM assumes that every mutation results in the creation of a new allele [8]. 
The mutational mechanism of microsatellites is still under debate though it appears most likely to be 
slippage events during DNA replication [9]. Several other mechanisms may also be responsible for the 
generation of new alleles, e.g., insertions/deletions (indels) in the flanking region [10]. Matsuoka showed 
that the IAM model was appropriate for maize microsatellites mutated in the flanking regions [10]. 
Knowledge of the mutational pattern of one specific SSR could facilitate the selection of appropriate 
mutation model and combinations of markers in the population genetic studies. Currently, due to their 
codominant inheritance, highly polymorphic, easy detection by polymerase chain reaction (PCR) and 
broad distribution in the genome, microsatellites/SSRs are widely used for population genetic 
studies [11]. Ascunce et al. have used a large number of SSRs to investigate the global invasion route 
of the fire ant Solenopsis invicta [6]. However, population genetic studies of F. occidentalis have been 
hampered by a lack of polymorphic molecular markers. Presently, only 6 polymorphic microsatellites 
of F. occidentalis are known [12]. Recently, an enormous number of ESTs (expressed sequence tags) 
of F. occidentalis have become available in the public sequence database [13], and can be exploited to 
identify markers inexpensively. Hence, we isolated and characterized 18 novel EST-SSRs for 
F. occidentalis. These EST-SSRs will allow researchers to investigate the genetic diversity and 
population genetic structure of F. occidentalis in its native and invasive range and trace its global 
invasion history. 
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2. Results and Discussion 

2.1. Characteristics of ¥. occidentalis EST-SSRs 

We obtained 309 sequences containing SSRs by MIcroSAtellite (MISA) [14] analysis. Among these 
sequences, five contained two different SSRs and three of these were compound microsatellites 
(Table 1). The EST-SSR frequency (1SSR/24.1 kb) of F. occidentalis was smaller than that of brown 
planthopper (1SSR/13.0 kb; [15]), pea aphid (1SSR/3.0 kb; [16]) and several other insects 
(-1SSR/1 kb in fly, silkworm and mosquito; [17]) and was comparable to some crops (1SSR/23.80 kb 
in soybean and 1SSR /28.32 kb in maize; [18]). The most abundant repeat motif class was dinucleotide 
repeats (DNRs, 265/314). The AC/GT (41.5%) motif was the most common among DNRs, followed 
by AG/CT (31.7%), AT/AT (22.3%) and CG/CG (4.5%). The classification of repeats into classes was 
carried out according to the method of Jurka and Pethiyagoda [19]. For example, (AC)n, (CA)n, (TG)n 
and (GT)n were considered as the same class considering complementary sequences and/or different 
reading frames. Other repeat motifs, including trinucleotide, tetranucleotide, pentanucleotide, were 
also observed, albeit infrequently (49/314) (Table 1). 

Table 1. Frequency and distribution of SSRs in the analyzed Frankliniella occidentalis ESTs. 









Number of repeat motif (n) 




Repeats 


4 


5 


6 7 8 


Total 


AC/GT 




89 


14 4 3 


110 


AG/CT 




67 


11 6 


84 


AT/AT 




54 


4 1 


59 


CG/CG 




12 




12 


AAC/GTT 




7 


1 


8 


AAG/CTT 




1 




1 


AAT/ATT 




5 


1 


6 


ACC/GGT 




4 


1 


5 


ACT/AGT 




1 




1 


AGC/CTG 




5 




5 


AGG/CCT 




1 


1 


2 


ATC/ATG 




1 




1 


AAAC/GTTT 


3 






3 


AAAT/ATTT 


3 






3 


AACC/GGTT 


2 






2 


AATC/ATTG 


2 






2 


AATG/ATTC 


3 






3 


ACAT/ATGT 


1 






1 


ACTG/AGTC 


1 






1 


AGAT/ATCT 


3 






3 


ATGC/ATGC 


1 






1 


ACGGC/CCGTG 


1 






1 


NN(DNR) 




222 


29 11 3 


265 


NNN(TNR) 




25 


4 


29 


NNNN(TTNR) 


19 






19 


NNNNN(PNR) 


1 






1 



repeats; NNNNN(PNR) is pentanucleotidic repeats. 
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Of the primer pairs designed from 122 sequences suitable for primer design, 72 amplified the 
expected products, 50 yielded larger or no products. Finally, 18 primer pairs revealed polymorphism 
(Table 2), the remaining (54) were either monomorphic or amplified poorly. For these 18 selected 
ESTs, homology searches with the BLASTX against the NCBI nr database found three ESTs with 
significant hits to insect genes at an E-value cutoff level of le-5 (Table 2). No hit was found for any of 
the other 15 ESTs. Sequence length variation in coding sequences is rare. Examination of the three 
sequences with significant blast hits suggested that SSR sequences may be located on either 5'- or 
3'- UTR (untranslated regions). The 15 remaining ESTs with unknown function may also originate 
from non-coding regions. These selected ESTs seem unlikely to come from non-insect sources because 
of the high amplification rates (approximately 99.5%) of these EST-SSRs across the 96 F. occidentalis 
samples. When analyzed all the published F. occidentalis ESTs, 17 of the 18 selected ESTs were 
singletons, the remaining one (GT306150) was part of a larger contig which contained only two ESTs. 
The low copy number of these ESTs suggested that they seem unlikely to be repetitive sequences in 
the nuclear genome. When considering all three populations (Table 3), 99 alleles were identified from 
18 markers, the number of alleles (/Va) ranged from 2 to 15, with an average of 5.50 alleles per locus. 
The observed (H 0 ) and expected (H E ) heterozygosities ranged from 0.072 to 0.707 and 0.089 to 0.851, 
respectively. The PIC values ranged from 0.088 to 0.860, with an average of 0.476 (Table 2). 
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Table 2. Characteristics of 18 new EST-SSRs in Frankliniella occidentalis. 



Locus 


Genbank number 
(Sequence length) 


Putative gene function a 


Repeat 
motif 


Primer sequence (5 '-3') 


Size range 
(bp) 


Ja 
(°C) 


Ho 


He 


N 


iVa 


PIC 






XP_001945214.1jPREDICTED: 




















WFT20 


GT306150 
(292bp) 


uncharacterized protein 

C 14orf 1 38-like [Acyrthosiphon 

pisum] 4e-08 


(AC)6 


F: CGTAGCCGCCCAAACTGTT 
R: CCTTCCAATTCAAATTCCCT 


144-153 


52 


0.654 


0.657 


96 


5 


0.609 


WFT24 


GT303793 
(745bp) 


Unknown 


(TG)6 


F: ACGAAGTTTGGTTTGGGTGG 
R: AAGTTTCCTCCGCTCATTTC 


208-214 


52 


0.257 


0.290 


96 


2 


0.258 


WFT25 


GT303588 
(608bp) 


Unknown 


(GA)7 


F: CACCAGTCGCGTTCATTGA 
R: GCCTCCAGCAGCACAAGTA 


96-149 


52 


0.707 


0.851 


96 


14 


0.860 


WFT28 


GT303349 
(784bp) 


Unknown 


(TA)6 


F: GGGCTTGAAATAATGTTCTG 
R: GTAAATAAATCAGTGGAGGGT 


91-95 


52 


0.241 


0.224 


96 


3 


0.230 


WFT37 


GT302424 
(759 bp) 


Unknown 


(TTA)5 


F: GCATACCCTGTGAACGAGTG 
R: ACAGAAGCAAATGTCTACCTGA 


213-222 


52 


0.372 


0.410 


94 


4 


0.384 


WFT50 


GT3 10239 
(785 bp) 


Unknown 


(TG)7 


F: CGGAGTGAGCAGGAGTTGT 
R: TTGCCCCTACCAAAATATGA 


171-181 


52 


0.313 


0.458 


95 


6 


0.427 


WFT51 


GT310133 
(741bp) 


XP 002425957. 1| abrupt protein, 
putative [Pediculus humanus 
corporis] 7e-76 


(TG)8 


F: GTACGCAGGAGAAGTAAATG 
R: ACAAATCCAGATGGCAACC 


297-305 


52 


0.623 


0.590 


96 


5 


0.555 


WFT64 


GT3 11293 
(5 18 bp) 


Unknown 


(TCC)6 


F: CTTTTCGGATTCTCCTTCG 

R: GGAGACCTGATTCACCGTATG 


243-250 


52 


0.590 


0.539 


95 


4 


0.443 


WFT66 


GT305093 
(175 bp) 


Unknown 


(ACT)5 


F: AACTTAGGAAGAAAGACTGTAGA 
R: TGTTTACGCACGCACGCAT 


113-116 


52 


0.505 


0.476 


96 


3 


0.432 


WFT83 


GT304065 
(616 bp) 


Unknown 


(TA)5 


F: GGAGGTACTGACTAAAGCATG 
R: GGGACAGACAAAACAGGAAA 


172-176 


52 


0.462 


0.451 


96 


4 


0.394 


WFT87 


GT303951 
(830 bp) 


Unknown 


(TG)5 


F: GGTCTGAACTGTATGGGATG 
R: CAGGACCCTAGTATGTAAGAAA 


259-277 


52 


0.346 


0.391 


94 


7 


0.394 


WFT98 


GT299180 
(436 bp) 


Unknown 


(TG)5 


F: GGGGCAGTTTGCTCTTGT 
R: CTGTTCATGGTCACTTTGG 


145-153 


52 


0.645 


0.636 


96 


5 


0.587 
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Table 2. Cont. 



Locus 


Genbank number 
(Sequence length) 


Putative gene function " 


Repeat 
motif 


Primer sequence (5 '-3') 


Size range 
(bp) 


Tsi 
(°C) 


Ho 


He 


N 


iVa 


PIC 


WFT104 


GT298583 
(602 bp) 


XP_002063687.1|GK15779 
[Drosophila willistoni] 4e-17 


(GT)5 


F: TCACGCAAGCTAACAGCCCT 
R: ACAAAGTTGCCTGCCTGAAT 


150-159 


52 


0.478 


0.473 


96 


4 


0.427 


WFT108 


GT300460 
(686 bp) 


Unknown 


(AT)5 


F: AGGATAGCTTGTTTTGTTGG 
R: CCATTTGTAACTAGCGTAGGA 


135-140 


52 


0.355 


0.350 


96 


4 


0.328 


WFT124 


GT311015 
(11 14 bp) 


Unknown 


(TG)5 


F: CATTATGTGCCTCACCTCCG 
R: GCCTCAATTCTTCCTTGCG 


278-281 


52 


0.521 


0.664 


95 


4 


0.626 


WFT139 


GT301579 
(439 bp) 


Unknown 


(TC)5 


F:CATGGGTCCTTCCAGTGAG 
R: GCGAAACCTATCCCCTTATC 


138-142 


52 


0.072 


0.089 


96 


3 


0.088 


WFT141 


GT3 10355 
(612 bp) 


Unknown 


(GT)5 


F:GCTTTTGCATACCTTGTCTTC 
R: GGTAAGGGCCGGTTTTGTT 


174-183 


52 


0.588 


0.678 


96 


7 


0.651 


WFT144 


GT310315 
(766 bp) 


Unknown 


(GT)5 


F:TCGCAGAAGTTTGTGGTGAG 
R: GAGCCGATAAAAGTAGTGGAG 


94-137 


52 


0.598 


0.844 


94 


15 


0.875 


Mean 














0.463 


0.504 




5.500 


0.476 



Ta, annealing temperature; N: number of analyzed individuals; N&: number of alleles detected; H 0 : observed heterozygosity; H E : expected heterozygosity; PIC: polymorphism information 



content; a Based on BLASTX analysis. The species source and Accession No. of the best hit(s) is indicated, together with the E-value for the match. 
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Table 3. Collection information for samples used in this study. 



Location 


Nb samples 


Coordinates 


Sampling dates 


Host 


Harbin 


31 


45°44'51.13"N, 
126°38'2.54"E 


23 July 2010 


Tagetes erecta L.; Hosta ventricosa 
(Salisb.) Stearn 


Dali 


35 


25°36'17.49"N, 


30 July 2011 


Petunia hybrida Vilm; Nicandra 


100°14'49.75"E 


physalodes; Carina indica L. 


Guiyang 


30 


26°37'40.97"N, 
106°49'23.06"E 


26 July 2011 


Solanum melongena L.; 
Cucurbita pepo L. 



Nb samples, number of samples. 



After sequential Bonferroni correction for multiple tests, only WFT144 in Dali and WFT50 in 
Guiyang significantly deviated from Hardy- Weinberg equilibrium (HWE), possibly due to the 
presence of null alleles, which was further confirmed by the MICRO-CHECKER [20] analysis (Table 4). 
In addition, no band-stuttering, large allele dropouts or significant genotypic linkage disequilibrium 
was detected. Genetic diversity analysis indicated that Dali displayed the highest number of alleles 
(Mi = 4.944) and expected heterozygosity (H E = 0.522) and Harbin the lowest (Mi = 4.389; 
H E = 0.468) (Table 4). 

Table 4. Population genetic parameters at 18 microsatellite loci in 3 Frankliniella 
occidentalis populations. 



Locus Harbin Dali Guiyang 





N 


Na 


Ho 


H E 


r 


N 


Na 


Ho 


H E 


r 


N 


Na 


Ho 


H E 


r 


WFT20 


31 


4 


0.613 


0.618 


-0.003 


35 


5 


0.714 


0.671 


-0.039 


30 


5 


0.633 


0.684 


0.035 


WFT24 


31 


2 


0.452 


0.437 


-0.017 


35 


2 


0.086 


0.133 


0.110 


30 


2 


0.233 


0.299 


0.095 


WFT25 


31 


10 


0.613 


0.811 


0.119 


35 


11 


0.743 


0.871 


0.069 


30 


12 


0.767 


0.873 


0.064 


WFT28 


31 


3 


0.290 


0.252 


-0.155 


35 


3 


0.400 


0.355 


-0.062 


30 


3 


0.033 


0.065 


0.149 


WFT37 


30 


3 


0.200 


0.238 


0.062 


35 


3 


0.400 


0.483 


0.078 


29 


4 


0.517 


0.510 


0.007 


WFT50 


31 


6 


0.290 


0.409 


0.112 


34 


5 


0.382 


0.476 


0.076 


30 


5 


0.267 


0.491 


0.193 


WFT51 


31 


4 


0.613 


0.558 


-0.069 


35 


5 


0.657 


0.616 


-0.040 


30 


5 


0.600 


0.596 


-0.012 


WFT64 


31 


3 


0.548 


0.529 


-0.019 


34 


3 


0.588 


0.543 


-0.056 


30 


3 


0.633 


0.546 


-0.084 


WFT66 


31 


3 


0.387 


0.395 


0.013 


35 


3 


0.629 


0.568 


-0.049 


30 


3 


0.500 


0.466 


-0.023 


WFT83 


31 


3 


0.548 


0.505 


-0.047 


35 


4 


0.371 


0.348 


-0.049 


30 


3 


0.467 


0.499 


0.036 


WFT87 


30 


5 


0.200 


0.187 


-0.102 


34 


6 


0.471 


0.501 


0.009 


30 


6 


0.367 


0.484 


0.106 


WFT98 


31 


4 


0.516 


0.546 


0.027 


35 


4 


0.686 


0.644 


-0.043 


30 


5 


0.733 


0.718 


-0.011 


WFT104 


31 


3 


0.419 


0.370 


-0.064 


35 


4 


0.514 


0.573 


0.051 


30 


4 


0.500 


0.474 


-0.027 


WFT108 


31 


3 


0.355 


0.297 


-0.193 


35 


4 


0.343 


0.394 


0.093 


30 


3 


0.367 


0.359 


0.026 


WFT124 


31 


4 


0.452 


0.618 


0.129 


34 


4 


0.412 


0.654 


0.176 


30 


4 


0.700 


0.722 


0.019 


WFT139 


31 


3 


0.097 


0.151 


0.114 


35 


3 


0.086 


0.083 


-0.043 


30 


2 


0.033 


0.033 


-0.017 


WFT141 


31 


5 


0.516 


0.668 


0.103 


35 


7 


0.514 


0.601 


0.069 


30 


5 


0.733 


0.766 


0.020 


WFT144 


31 


11 


0.645 


0.838 


0.116 


34 


13 


0.529 


0.881 


0.196 


29 


11 


0.621 


0.812 


0.124 


Mean 




4.389 


0.431 


0.468 


0.007 




4.944 


0.474 


0.522 


0.030 




4.722 


0.484 


0.522 


0.039 



N: number of analyzed individuals; N&: number of alleles detected; H 0 : observed heterozygosity; H E : expected 
heterozygosity; r. null allele frequency; Bold indicates deviations from Hardy-Weinberg equilibrium after sequential 
Bonferroni correction for multiple tests (P < 0.05). 
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2.2. Mutations of EST-SSRs 

Ninety-five different alleles, whose allele frequency was approximately 98%, were successfully 
sequenced. All the sequences obtained corresponded exactly to the expected EST sequences. The other 
4 rare alleles with a low frequency were not sequenced. Sequence analysis of these alleles revealed that 
three types of mutational events are responsible for the generation of new alleles (Six loci exhibiting 
all three mutation patterns are listed in Figure 1, the other 12 are shown in Figures SI and S2). First, 
size variation of sequenced alleles was explained by the differences in the numbers of repeat motifs for 
7 microsatellites (Figure 1A, Figure SI). Second, at the WFT51, WFT83, WFT108 and WFT124 loci, 
two different repeat motifs, including one in the flanking region, were found, both contributing to the 
allele-size variation (Figure IB, Figure S2A). Third, indels in the flanking region were observed in 7 
loci (WFT20, WFT66, WFT87, WFT104, WFT139, WFT141 and WFT144; (Figure 1C, Figure S2B)), 
but the frequencies of these alleles were very low in four loci (WFT20: 0.088; WFT66: 0.088; WFT87: 
0.144 and WFT139: 0.021). Besides the three mutation patterns mentioned above, base substitutions in 
the repeat or the flanking region were also observed in 9 and 9 loci respectively. They did not 
contribute to the length changes of the microsatellites. In addition, several loci had multiple mutation 
types mentioned above, e.g., WFT37 contained both base substitutions in the flanking region and 
step-wise mutation in the repeat region; WFT104 had both indels in the flanking regions and step-wise 
mutation in the repeat motif. A minimum number of contiguous repeats might be necessary for 
slippage to occur. These have been suggested to be four in di-nucleotide repeats and two in tri- and 
tetra-nucleotide repeats [21,22]. Using these criteria, we calculated the frequency of slippage 
consistent and inconsistent mutations. The allele size variation mainly came from the slippage at the 
repeat motifs. Sixty-six alleles with a frequency of 75.5% from the 18 loci possessed this mutation 
mechanism. Twenty-two alleles from 7 loci showed slippage in the flanking region, with the frequency 
of 19.9%. In addition, mutation mechanisms other than slippage also occurred in our microsatellites, 
with the frequencies of 12.6% (20 alleles from 6 loci) in the repeat motifs and 8.4% (11 alleles from 
7 loci) in the flanking region. Generally speaking, slippage in the repeat motif and flanking region was 
the main mutation mechanism for the newly developed microsatellites. 

Length changes in microsatellite DNA are generally thought to arise from replication slippage [9]. 
However, a complex mutational pattern of F. occidentalis EST-SSRs was observed in this study. These 
mutational patterns (changes in the number of microsatellite repeat units, base substitutions and indels 
within flanking region) were also found in microsatellites of insects [15,23] and other organisms, 
including the maize [24] and birds [25]. It seems that the complex mutational pattern is common in the 
eukaryotic genomes. Zhu et al. showed that indel slippage or length independent slippage tended to 
duplicate short sequences [26]. The number of repeat motifs of F. occidentalis ESTs was low (n < 9; 
Table 1) suggesting that indel slippage may be responsible for the complex mutational pattern of 
EST-SSRs in F. occidentalis. 
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Figure 1. Mutational patterns of EST-SSRs in Frankliniella occidentalis . 
(A) microsatellites mutated in the repeat motif; (B) microsatellites which have one SSR in 
the flanking region; (C) microsatellites which have indels in the flanking region. Each base 
is indicated by a different color. The repeat motifs are shown in black box. The red box 
indicates the repeat motifs in the flanking region and the blue one indicates the indels in the 
flanking region. 



A) 



d-222 
c-219 
a-213 
b-216 
GT302424 



WFT37 Repeat motif: TTA 



AACATGAGGTTTGCAACAGAAAA rTATTATTATTATTATTATTA 2AAGCTAGACAAGCGCACTTTTAACTCAGCAACTTTTAACTC 



AACATGAGGTTTGCAACAGAAAA rTATTATTATTATTATTA 

AACATGAGGTTTGCAACAGAAAA rTATTATTATTA 

AACATGAGGTTTGCAACAGAAAA rTATTATTATTATTA 

AACACGAGGTTTGCAACAGAAAA rTATTATTATTATTA 

0 40 50. AT. ...CO 70. 



AAGCTAGACAAGCGCACTTTTAACTCAGCAACTTTTAACTC 
3AAGCTAGACAAGCGCACTTTTAACTCAGCAACTTTTAACTC 
JAAGCTAGACAAGCGCACTTTTAACTCAGCAACTTTTAACTC 
■AAGCTAGACAAGCQCACTTTTAACTCAGCAACTTTTAACTC 

. .80 90 100 110 



WFT50 Repeat motif: TG 



b-1 73 CTTCCCTCCTGCATATCCGTCCGCCTTGAACTGCGCGIGTGTGTGTGTGTG 

e- 1 8 1 CTTCCCTCCTGCATATCCGTCCGCCTTGAACTGCGCG1K3' 

d-179 CTTCCCTCTTGCATATCCGTCCGCCTTGAACTGCGCGC|GTGTGTGTGTGTGTGTGTG 

C-1 75 CTTCCCTCCTGCATATCCGTCCGCCTTGAACTGCGCGdGTGTGTGTGTGTGTG 

a- 1 7 1 CTTCCCTCCTGCATATCCGTCCGCCTTGAACTGCGTG«3TGTGTGTGTG 

GT3 1 0239 CTTCCCTCCTGCATATCCGTCCGCCTTGAACTGCGCGdGTGTGTGTGTGTGTG 

o so 60 70 a e » e i ' 



B) 



d-281 
b-279 
c-280 
a -278 
GT31.015 



CCCAGTCAATGTGATTGATTCACCTTC 
jGTGTGTGTGTGTGTGTGTGTG CCCAGTCAATGTGATTGATTCACCTTC 
CCCAGTCAATGTGATTGATTCACCTTC 
CCCAGTCAATGTGATTGATTCACCTTC 
CCCAGTCAATGTGATTGATTCACCTTC 
CCCAGTCAATGTGATTGATTCACCTTC 
00 110 120 



WFT124 Repeat motif: CA 



, i i ■ iCCACCT A T AT AAAT ACirTTTT'] A AT AT A CTTT AC AT AAT A A i i AAtT A r /TT A C :TCTCA CA C AAA TTTC ATCtTT A X* OCT AAACA' 
1 A I ^CCACCTAAAT AAATAC.TTTT i A A T AT ACTTT AC A TAATAAi.AATTA :7TT AC -TCTC AC A C AAA TTTC A TCTTT A T A CiCt AA ACA' 

.ttu aaa /tt.tttta . wocaoct ataiaaatac .tttii . a at at a cm ac a t aat a a i > a att a zrr a c - tctc ac a c aaa a a tc atcttt a ta net aa aca : 

\CATTrj A AA TTTITTTT A IjIjCCACCT A T At AAAT AC jTTTT 1 ] A A T AT A CTTT AC At AAT AA < ; A A TT A '7TT A COTCTC AC A CA A AAA TC ATCTTT ATA iJCT AAACA! 
ACAITO AAA ITtT'STT^T A QQCCACCT At AT AAAT AQ^flrTT<2 AAXaTAjCTTT. ACAI AATAA 'i AATT A 7TT AC jTCTC AC A C AAAAA TC A TCTTT A T A fJCT AAAfiAl 



.140. 



. I 5 D 




c-1 74 
d-176 
a-172 
b-173 
GT304065 



C) 



e-153 
c-1 49 
d-151 
GT306150 
b-1 45 
a- 144 



d-159 
a- 150 
b-1 52 
c-1 54 
GT298583 



WFT83 Repeat motif: TA 



AATAAAAAAAA - GGAAGCCCAATTCAACTCAATGAGTTACATGACTTGAACTTCAGATCTGAGAAAG ATA TATATATATfcTCTCCAAAA 
AATAAAAAAAA GGAAGCCCAATTCAACTCAATGAGTTACATGACTTGAACTTCAGATCTGAGAAAC ATATATATATATAT TTCTCCAAAA 
AATAAAAAAAA - GGAAGCCCAATTCAACTCAATGAGTTACATGACTTGAACTTCAGATCTGAGAAAC ATA- TATATAT rrCTCCAAAA 
AATAAAAAAAAAGGAAGCCCAATTCAACrCAATGAGTTACATGACTTGAACTTCAGATCTGAGAAAC ATA TATATAT 3TCTCCAAAA 

AATAAAAAAAA - GGAAGCCCAATTCAACTCAATGAGTTACATGACTTGAACTTCAGATCTGAGAAAC ATA TATATATfrTTTCCAAAA 

... .60 '.70 80 90 100 110 120/ 13 0 "A .140. ■ . . . 

WFT20 Repeat motif: AC 



GTAGCCGCCCAAACTGTTTACACACACACACA< 
GTAGCCGCCCAAACTGTTTACAC ACACAT 
GTAGCCGCCCAAACTGTTIIACACACACACAC - - 
GTAGCCGCCCAAACTGTT1 ACAC ACACACAC 
GTAGCCGCCCAAACTGTTT ACATACACAC - - 

GTAGCCGCCCAAACTGTTT ACATACACAC 

20 .i O 40. 



!ATTTTGATTTTAATATTTATATCCCACAGCTGAACCTTT 
CfTCTT GCCjTTTATTTTGATTTTAATATTTATATTCCACATCTGAACCTTT 
C TCTT - GCCTTTATTTTGATTTTAATATTTATATCCCACATCTGAACCTTT 
C TCTT - GCQTTTATTTTGATTTTAATATTTATATCCCACATCTGAACCTTT 

C TCTTTGCCjTTT GATTTTAATATTTATATCCCACATCTG AACCTTT 

C TCTT - GCCTTT GATTTTAATATTTATATCCCACATCTG AACCTTT 



. .50. . 



. . 80 . . 



. .70. . 



. .80. . . 



. .90. . 



WFT104 Repeat motif: GT 



GCCTTCTTTTT] GTGTGTGTGTtTTTA - - TTAATACATATCTTTGG ATTTGACCAAAAGCAGGGC TCAGGGCTG rTAGCTTGCGTGA 



GCCTTCTTTTn GTGTGTGT 



TTTA - TTAATACATATCTTTGG ATTTGACCAAAAGCAGGGC T G rTAGCTTGCGTGA 



GCCTTCTTTTT] GTGTGTGTGT rTTA - - TTAATACATATCTTTGGATTTGACCAAAAGCAGGGC T G rTAGCTTGCGTGA 

GCCTTCTTTm GTGTGTGTGT rTTATATTAATACATATCTTTGGATTTGACCAAAAGCAGGGC T G rTAGCTTGCGTGA 

GCCTTCTTTmjGTGTGTGTGTjrTTA - - TTAATACATATCTTTGGATTTGACCAAAAGCAGGGCjT G|rTAGCTTGCGTGA 

. 9 0.7. . '. ■ .100 110 120 130 7714 0 7 7150 



Global and pairwise Fst and Rsi among three populations were then calculated. Fst assumes an 
infinite allele model and Rst assumes a stepwise mutation model [27]. Global Fst and Rst considering 
all 18 loci showed a low but significant differentiation (global F S t = 0.029, P < 0.001; global 
Rst = 0.023, P < 0.001) among all three populations (Table 5). Moreover, including the loci which 
have one SSR or (and) indels in the flanking region did not significantly change the global Fst and Rst 
values with overlapping 95% confidence intervals (Table 5). When considering the same loci 
combinations, the global and pairwise Fst and Rst values did not differ significantly from each other 
with overlapping 95% confidence intervals (Table 5). However, no clear correlation was found 
between pairwise estimate of F ST and i? ST (Spearman r = -0.202; P = 0.264). Pairwise F ST results are 
quite consistent in all cases, Dali/Guiyang exhibited the lowest differentiation estimates and 
Harbin/Guiyang exhibited the highest. However, they were not reflected in the pairwise Rst results. 
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This might be due to the fact that the micro satellites mutated in the flanking region did not strictly 
conform to IAM and/or SMM model(s). Eleven out of 18 loci showed multiple sources of length 
variation which cannot be explained solely by gain or loss of one or two repeats as in the case of SMM 
based models. Thus, methods based on the IAM might be appropriate for many loci in our study, 
although they were not supported by our analysis. Anderson also suggested that IAM was more 
appropriate for one parasite's (Plasmodium falciparum) microsatellites which have complex mutation 
patterns [28]. Furthermore, the precision of global differentiation estimates improves (the confidence 
intervals narrows) with increasing numbers of loci analyzed (Table 5). Thus, if users of the described 
microsatellites want precision in their estimates, more loci should be used. 



Table 5. Pairwise F ST (below the diagonal) and i?sT (above the diagonal) matrix using 
different combinations of EST-SSRs of Frankliniella occidentalis. 



Number 
of loci 



Populations 



Populations 



Harbin 



Dali 



Guiyang 



Global F ST Global R ST 





Harbin 




0.025 (0.008-0.074) 


0.006 (-0.006-0.066) 


0.022 

(0.007-0.044) 


0.020 

(0.011-0.067) 


7 Loci a 


Dali 
Guiyang 


0.033 (0.007-0.074) 
0.034 (0.016-0.049) 


0.005 (-0.010-0.026) 


0.021 (0.007-0.071) 




Harbin 




0.024 (0.013-0.068) 


0.020 (0.010-0.069) 


0.030 

(0.011-0.056) 


0.030 

(0.023-0.069) 


1 1 Loci b 


Dali 
Guiyang 


0.031 (0.008-0.057) 
0.032 (0.021-0.041) 


0.029 (0.002-0.085) 


0.040 (0.026-0.087) 




Harbin 




0.023 (0.015-0.061) 


0.008 (0.002-0.051) 


0.025 

(0.015-0.036) 


0.016 

(0.014-0.051) 


14 Loci c 


Dali 
Guiyang 


0.031 (0.014-0.055) 
0.033 (0.019-0.048) 


0.012 (-0.001-0.027) 


0.011 (0.007-0.052) 




Harbin 




0.022 (0.017-0.056) 


0.016 (0.011-0.056) 


0.029 

(0.016-0.046) 


0.023 

(0.022-0.057) 


18 Loci 


Dali 
Guiyang 


0.030 (0.014-0.047) 
0.032 (0.020-0.045) 


0.026 (0.004-0.062) 


0.025 (0.019-0.063) 



a including the 7 loci mutated in the repeat motif; b including the 7 loci mutated in the repeat motif and 4 loci which have 
one SSR in the flanking region; c including the 7 loci mutated in the repeat motif and 7 loci which have indels in the 
flanking region; Bold indicates significant after Bonferroni correction (P = 0.05); Values in parentheses indicate 95% 
confidence intervals. 



3. Experimental Section 

3.1. EST Database Mining 

13,839 F. occidentalis EST sequences were obtained from GenBank [29]. EST-trimmer [30] was 
then used to remove poly (A/T) stretches from the 5 'or 3' ends until there were no (A)5 or (T)5 within 
the range of 50 bp. EST sequences shorter than 100 bp were excluded and those longer than 700 bp 
were clipped at their 5' end to preclude the inclusion of low-quality sequences [31]. Those obtained 
sequences were screened for microsatellites containing at least five di-, five tri-, four tetra-, four penta- 
and four hexa-nucleotide repeats using the software MIS A [14]. PCR primers flanking the 
microsatellite repeats were designed using Primer Premier 5.0 [32]. The selected ESTs were compared 
to the NCBI nr protein database using the BLASTX program. A suggested cut-off value of le-5 was 
chosen to assign a potential homologue for each EST sequence [13]. 

3.2. Sample Collection and DNA Extraction 

In total, 96 F. occidentalis female adults were sampled representative of 3 sites in China during July 
2010 to July 201 1 (Table 3). Total genomic DNA was extracted by homogenizing a single female adult 
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in a 50 uL mixture of STE buffer (100 mM NaCl, 10 mM Tris-HCl, 1 mM EDTA, pH 8.0) in a 1.5 mL 
Eppendorf tube. The mixture was incubated with 2 uL proteinase K (10 mg/mL) at 37 °C for 30 min, 
followed by 5 min at 95 °C. The samples were centrifuged briefly, and used immediately or stored at 
-20 °C for the PCR reactions. 

3.3. Primer Testing 

The forward primer of each set was tailed with U19 (GGTTTTCCCAGTCACGACG) to facilitate 
labeling. PCR amplifications were performed on an Applied Biosystems VeritiTM Thermal Cycler 
(Applied Biosystems). Each 10 |J,L amplification mixture contained 1 x PCR buffer, 0.2 mM of each 
dNTP, -50 ng of DNA, 0.25 units of Maxima Hot Start Taq DNA polymerase (Fermentas, Canada), 
0.04 |oM of each forward primer, 0.2 \iM of each of the reverse primer and the dye-labeled U19 primer 
(FAM, VIC, NED or PET). These cycling conditions were an initial denaturing for 4 min at 95 °C; 
10 cycles of 95 °C for 30 s, 51 °C for 30 s, 72 °C for 30 s; 25 cycles of 95 °C for 30 s, 54 °C for 30 s, 
72 °C for 30 s, and a final extension at 72 °C for 10 min. PCR products were run on the ABI 3130 
capillary sequencer along with the GeneScan-500 LIZ size standard and allele sizes were determined 
using GENEMAPPER version 4.0 (Applied Biosystems). 

3.4. Allele Sequencing 

Different alleles per locus detected by the capillary sequencer were amplified using a 50 |xL PCR 
reaction with non-fluorescent labeling primers (conditions as above with a specific anneal temperature 
at 52 °C). The purified PCR products (purified using Axygen cleanup kit) were subsequently ligated 
into the pGEM-T vector (Promega) and introduced into Escherichia coli DH5a cells. Six positive 
clones for each allele were sequenced to exclude PCR artefacts. Alignments of the sequenced alleles 
were generated using the Clustal X 2.0.1 1 program [33]. Several loci were then manually aligned using 
BioEdit 7.0.4 [34]. 

3.5. Data Analysis 

All genetic statistics were carried out based on the genotyping data from three populations. 
MICRO-CHECKER 2.2.3 was used to detect genotyping errors due to null alleles, stuttering, or allele 
dropout using 1000 randomizations [20]. The program Genepop 4.0.10 [35] was used to test for 
linkage disequilibrium between pairs of loci in each population (100 batches, 1000 iterations per batch) 
and for deviations from Hardy-Weinberg equilibrium (HWE) at each locus/population combination 
using Fisher's exact tests. The population genetic diversity indices such as total alleles per locus (A/a), 
observed heterozygosity (H 0 ), expected heterozygosity (He) and mean number of alleles (A/a) 
was assessed using GenAlEx 6.41 [36]. We also calculated the polymorphism information content 
(PIC) using CERVUS version 3.0 [37]. Pairwise Fst and Rst value and their significance for 
each population comparison were calculated with 10,000 permutations in Arlequin 3.0 [38] and RST 
CALC 2.2 [27], respectively. 
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4. Conclusions 

In summary, 18 highly polymorphic EST-SSRs have been specifically developed for F. occidentalis 
in this study. Sequence analysis of alleles per locus revealed a complex mutational pattern of these 
EST-SSRs. Thus, these EST-SSRs are useful markers for the invasive species F. occidentalis but 
greater attention should be paid to the mutational characteristics of these markers when they are used 
in population genetic studies. 

Acknowledgments 

We thank Da-Song Chen and Yan-Kai Zhang of the Department of Entomology, Nanjing 
Agricultural University (NJAU) for help with the collection of western flower thrips. We are also 
grateful to Kai-Jun Zhang, Ming-Zhi Yu and Jin-Bo Li of the Department of Entomology, NJAU for 
their kind help with experiments. 

This work was supported by grants from the National Key Basic Research Program (973 Program, 
No. 2009CB1 19200) from the Ministry of Science and Technology of China and the Science and 
Technology Research Program of the National Agricultural Public Welfare Fund (No. 200803025) 
from the Ministry of Agriculture of China. 

References 

1. Mound, L.A.; Morris, D.C. The insect order Thysanoptera: classification versus systematics. 
Zootaxa 2007, 1668, 395-411. 

2. Reitz, S.R. Biology and ecology of the western flower thrips (Thysanoptera: Thripidae): The 
making of a pest. Fla. Entomol. 2009, 92, 7-13. 

3. Bryan, D.E.; Smith, R.F. The Frankliniella occidentalis (Pergande) complex in California. Univ. 
Calif. Publ. Entomol. 1956, 10, 359-410. 

4. Kirk, W.D.J.; Terry, L.I. The spread of the western flower thrips Frankliniella occidentalis 
(Pergande). Agric. For. Entomol. 2003, 5, 301-310. 

5. Behura, S.K. Molecular marker systems in insects: current trends and future avenues. Mol. Ecol. 
2006, 75,3087-3113. 

6. Ascunce, M.S.; Yang, C.C.; Oakey, J.; Calcaterra, L.; Wu, W.J.; Shih, C.J.; Goudet, J.; 
Ross, K.G.; Shoemaker, D. Global invasion history of the fire ant Solenopsis invicta. Science 
2011,331, 1066-1068. 

7. Toth, G.; Gaspari, Z.; Jurka, J. Microsatellites in different eukaryotic genomes: Survey and 
analysis. Genome Res. 2000, 10, 967-981. 

8. Bhargava, A.; Fuentes, F.F. Mutational dynamics of microsatellites. Mol. Biotechnol. 2010, 
44, 250-266. 

9. Ellegren, H. Microsatellites: Simple sequences with complex evolution. Nat. Rev. Genet. 2004, 
5, 435-445. 

10. Matsuoka, Y.; Mitchell, S.E.; Kresovich, S.; Goodman, M.; Doebley, J. Microsatellites in 
Zea-variability, patterns of mutations, and use for evolutionary studies. Theor. Appl. Genet. 2002, 
104, 436-450. 



Int. J. Mol. Sci. 2012, 13 



2875 



11. Bruford, M.W.; Wayne, R.K. Microsatellites and their application to population genetic studies. 
Curr. Opin. Genet. Dev. 1993, 3, 939-943. 

12. Brunner, P.C.; Frey, J.E. Isolation and characterization of six polymorphic microsatellite loci in 
the western flower thrips Frankliniella occidentalis (Insecta, Thysanoptera). Mol. Ecol. Notes 
2004, 4,599-601. 

13. Rotenberg, D.; Whitfield, A.E. Analysis of expressed sequence tags for Frankliniella occidentalis, 
the western flower thrips. Insect Mol. Biol. 2010, 19, 537-551. 

14. Thiel, T. MISA — Microsatellite identification tool. Available online: http://pgrc.ipk-gatersleben. 
de/misa/misa.html (accessed on 15 August 201 1). 

15. Sun, J.T.; Zhang, Y.K.; Ge, C; Hong, X.Y. Mining and characterization of sequence tagged 
microsatellites from the brown planthopper Nilaparvata lugens. J. Insect Sci. 2011, 11, 
134:1-134:11. 

16. Weng, Y.; Azhaguvel, P.; Michels, G.J.; Rudd, J.C. Cross-species transferability of microsatellite 
markers from six aphid (Hemiptera: Aphididae) species and their use for evaluating biotypic 
diversity in two cereal aphids. Insect Mol. Biol. 2007, 16, 613-622. 

17. Li, B.; Xia, Q.; Lu, C; Zhou, Z.; Xiang, Z. Analysis on frequency and density of microsatellites in 
coding sequences of several eukaryotic genomes. Genomics Proteomics Bioinforma. 2004, 2, 
24-31. 

18. Gao, L.; Tang, J.; Li, H.; Jia, J. Analysis of microsatellites in major crops assessed by 
computational and experimental approaches. Mol. Breed. 2003, 12, 245-261. 

19. Jurka, J.; Pethiyagoda, C. Simple repetitive DNA sequences from primates: Compilation and 
analysis. J. Mol. Evol. 1995, 40, 120-126. 

20. van Oosterhout, C; Hutchinson, W.F.; Wills, D.P.M.; Shipley, P. MICRO-CHECKER: Software 
for identifying and correcting genotyping errors in microsatellite data. Mol. Ecol. Notes 2004, 
4, 535-538. 

21. Messier, W.; Li, S.H.; Stewart, C.B. The birth of microsatellites. Nature 1996, 381, 483. 

22. Zhu, Y.; Queller, D.C.; Strassmann, J.E. A phylogenetic perspective on sequence evolution in 
microsatellite loci. J. Mol. Evol. 2000, 50, 324-338. 

23. Colson, I.; Goldstein, D.B. Evidence for complex mutations at microsatellite loci in Drosophila. 
Genetics 1999, 152, 617-627. 

24. Lia, V.V.; Bracco, M.; Gottlieb, A.M.; Poggio, L.; Confalonieri, VA. Complex mutational 
patterns and size homoplasy at maize microsatellite loci. Theor. Appl. Genet. 2007, 115, 981-991. 

25. Primmer, C.R.; Ellegren, H. Patterns of molecular evolution in avian microsatellites. Mol. Biol. 
Evol. 1998, 15, 997-1008. 

26. Zhu, Y.; Strassmann, J.E.; Queller, D.C. Insertions, substitutions, and the origin of microsatellites. 
Genet. Res. 2000, 76, 227-236. 

27. Goodman, S.J. RST CALC: A collection of computer programs for calculating unbiased estimates 
of genetic differentiation and determining their significance for microsatellite data. Mol. Ecol. 
1997, 6, 881-885. 



Int. J. Mol. Sci. 2012, 13 



2876 



28. Anderson, T.J.C.; Su, X.Z.; Roddam, A.W.; Day, K.P. Complex mutations in a high proportion of 
microsatellite loci from the protozoan parasite Plasmodium falciparum. Mol. Ecol. 2000, 9, 
1599-1608. 

29. National Center for Biotechnology Information Expressed Sequence Tags database. Available 
online: http://www.ncbi.nlm.nih.gov/dbEST/ (accessed on 15 August 2011). 

30. EST trimmer. Available online: http://pgrc.ipk-gatersleben.de/misa/download/est_trimmer.pl 
(accessed on 15 August 201 1). 

31. Thiel, T.; Michalek, W.; Varshney, R.K.; Graner, A. Exploiting EST databases for the 
development and characterization of gene-derived SSR-markers in barley (Hordeum vulgar e L.). 
Theor. Appl. Genet. 2003, 106, 411-422. 

32. Lalitha, S. Primer Premier 5.0. Biotech Software Internet Report 2000, 1, 270-272. 

33. Larkin, M.A.; Blackshields, G.; Brown, N.P.; Chenna, R.; McGettigan, PA.; McWilliam, H.; 
Valentin, F.; Wallace, I.M.; Wilm, A.; Lopez, R.; et al. Clustal W and Clustal X version 2.0. 
Bioinformatics 2007, 23, 2947-2948. 

34. Hall, TA. BioEdit: A user friendly biological sequence alignment editor and analyses program for 
Windows 95/98/NT. Nucleic Acids Symp. Ser. 1999, 41, 95-98. 

35. Rousset, F. GenePop'007: A complete re-implementation of the GenePop software for Windows 
and Linux. Mol. Ecol. Resour. 2008, 8, 103-106. 

36. Peakall, R.; Smouse, P.E. GENALEX 6: Genetic analysis in Excel. Population genetic software 
for teaching and research. Mol. Ecol. Notes 2006, 6, 288-295. 

37. Marshall, T.C.; Slate, J.; Kruuk, L.; Pemberton, J.M. Statistical confidence for likelihood-based 
paternity inference in natural populations. Mol. Ecol. 1998, 7, 639-655. 

38. Excoffier, L.; Laval, G.; Schneider, S. Arlequin, ver. 3.0: An integrated software package for 
population genetics data analysis. Evol. Bioinform. Online 2005, 1, 47-50. 



© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article 
distributed under the terms and conditions of the Creative Commons Attribution license 
(http://creativecommons.Org/licenses/by/3.0/). 



